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Abstract 

In 1977, Keane and Smorodinsky |2j showed that there exists a 
finitary homomorphism from any finite-alphabet Bernoulli process to 
any other finite-alphabet Bernoulli process of strictly lower entropy. 
In 1996, Serafin 17 proved the existence of a finitary homomorphism 
with finite expected coding length. In this paper, we construct such 
a homomorphism in which the coding length has exponential tails. 
Our construction is source-universal, in the sense that it does not use 
any information on the source distribution other than the alphabet 
size and a bound on the entropy gap between the source and target 
distributions. We also indicate how our methods can be extended to 
prove a source-specific version of the result for Markov chains. 



1 Introduction 

Let a, b G N, and define the two finite alphabets A = {i 6 Z : 1 < 2 < a}, 
B = {ieZ:l<i<6}. Equip the sequence spaces A z and B z with the 
product a-algebras A and B respectively. A measurable map <p : Q — > B z , 
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where f2 C A is measurable, is called translation-equivariant if for all 

x = (xj)j e z G Q, the left shift T(x) = (x i+ i) ie z of x is also in Q and the 
equality ip(T(x)) = T(ip(x)) holds. A translation-equivariant map 99 : O — > 
B z is finitary if for all x G f2, there exists an iV G N such that for all y G Q, 
if (xi)|i|<jv = (l/i)|i|<jv then p(x)o = (p(y)o- In this case we let N v (x) be the 
minimal such N, and call N v the coding length of ip. 

If p = (p(i))j £ A is a probability vector (that is to say, p(i) > for all i G A, 
and ^ igA p(«) = 1), let P p be the product measure p z on A. The quadruple 
B(p) = (A z , ^4, P p , T) is the Bernoulli shift of p. Similarly if q = {q{i))i£& 
is a probability vector on B, let P q be the product measure q z on £>, and let 
B(q) = (B z , £>, P g , T) be the Bernoulli shift of q. A homomorphism ip from 
.B(p) to B(q) is a translation-equivariant map ip : Q — > B z with the properties 
that fiGi, P p (fl) = 1, and for all -E G i3 we have P P (^- 1 (E)) = P q (E). 

Denote by h(p) = — J2iP(.i) l°gp(0 the entropy of a probability vector. 
Keane and Smorodinsky [S] proved that if h(p) > h(q), then there exists a 
finitary homomorphism p from -B(p) to B(q). Serafin ^7] demonstrated that 
<p may be chosen in such a way that the expected coding length Ep^N^) is 
finite. Iwanik and Serafin [0] strengthened this result to all moments below 
the second. 

We say that a finitary homomorphism has exponential tails if there 
exist c > and < d < 1 such that P p (N v > n) < c-d n for all n. In general, 
we say that a non-negative sequence (c n ) n£ N decays exponentially if there 
exist c > and < d < 1 such that c n < c ■ d n for all n. We say that 
a random variable has exponential tails if (P(|W| > n)) ne ^ decays 
exponentially. 

Our main result is a new construction of a finitary homomorphism from 
B(p) to B(q), when h(p) > h(q). Our construction improves on the above- 
mentioned results in two ways. First, the coding length will have exponential 
tails. Second, the homomorphism is source- universal, in the sense that the 
same function works simultaneously for all source vectors p over a given 
alphabet which have full support and whose entropy is greater than h(q) by 
at least a given e. In particular, this answers the open problem mentioned 
in the last two lines of [T4j . 

The precise result is the following: 

Theorem 1 Fix a probability vector q = (g(i))j £ B; and fix e > 0. There 
exists a measurable subset Q C A z and a finitary translation-equivariant 
map p : Q —>■ B z ; such that for any probability vector p = (p(i))i£A for which 
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p(i) > for all i G A and h(p) > + £, is a finitary homomorphism 
from B(p) to B{q) with exponential tails. 

Here is a brief description of the motivation and method of the proof of 
Theorem A homomorphism can be thought of as a translation-equivariant 
function that, given a sequence of independent samples with distribution p, 
simulates a sequence of independent samples with distribution q. Thus, it 
is natural to try to construct such a function using existing constructions of 
functions that simulate one discrete distribution using another. Such con- 
structions have been described in [3], [I], [13], [TO] , [T4j . 

Our construction combines elements from several of these constructions. 
First, divide the source sequence (xk)kez into blocks separated by markers. 
A marker is defined as an appearance of a certain (sufficiently rare) pattern, 
say a 2 followed by t l's, where t is a large enough integer. Next, feed the 
contents of each block into a specially designed function which converts these 
approximately independent p-distributed samples into independent unbiased 
bits. Next, at each block, start feeding these unbiased bits into another 
function designed to generate a number of independent samples of the distri- 
bution q sufficient to fill the length of that block. This function may require 
more bits than that block contains, but on the average, because of entropy 
considerations it requires less. Any unused bits are then used to satisfy blocks 
whose simulation did not end in the first round. Continuing in this manner 
one obtains the required number of samples of q, which are then used to 
generate the value <p(x). Everything is done simultaneously for all blocks 
in a translation equivariant manner, with an added bonus being the source 
universality property. 

The following is an extension of Theorem Q to Markov chains. 

Theorem 2 Let a = (ajj)jj e A, P = (A,i)i,ieB be two aperiodic, irreducible 
Markov transition matrices over the finite alphabets A,B. Let Ai(a) = 
(A z ,A,P a ,T), M{f3) = {B z ,B,Pf3,T) be the stationary Markov shifts of 
a and (3 respectively, and denote their entropies by h(a),h(j3). If h(a) > 
h{(3), then there exists a finitary homomorphism from Ai(a) to Ai((3) with 
exponential tails. 

We indicate in Section 5 how our methods may be adapted to prove The- 
orem^ For Markov chains, our construction is not source-universal, except 
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in the weak sense that it will work, under the assumption of an entropy gap, 
simultaneously for all Markov transition matrices with all entries positive - 
see Section 5. 

2 Simulations 

In this section, we construct two procedures for simulating one discrete dis- 
tribution from another. The constructions are variants of those used by Elias 
[3] , Han and Hoshi jl] , Knuth and Yao jT3| , and Romik [TH] . In all of these 
constructions, a key point is that the loss in entropy is small. 

2.1 Simulating a distribution from independent unbi- 
ased random bits 

For b G N, let B = {i G Z : 1 < % < b} be a finite alphabet, and let 
Q = (?W)ieB be a probability vector. Let ({0, 1} N , J 7 , P) be the probabil- 
ity space of (one-sided) infinite binary strings, equipped with the natural 
product ex-algebra, and the probability measure under which the coordinate 
functions are independent unbiased random bits. Let E denote expectation 
with respect to the measure P. 

A simulation of q from independent unbiased bits is a pair of 
measurable functions T : {0, 1} N — > N and S : {0, 1} N — > B, defined P-a.s., 
with the following properties: 

(i) If x = (xj)j G N and x = (xi) ie ^ are elements of {0, 1} N such that 
(xi, . . .,Xt(x)) = {xi, ■ ■ ■ ,xt(x)), then T(x) = T(x) and S(x) = S(x). 

(ii) Under the measure P, S(x) has distribution q. 

T is called the stopping time of the simulation, and S is called the output 
symbol. 

The following theorem was first proved by Knuth and Yao We 
present an independent proof with a more explicit construction. 

Theorem 3 There exists a simulation (T, S) of q from independent unbiased 
bits satisfying the additional properties: 

(i) T(x) has exponential tails. More precisely, P(T(x) > k) < 

(ii) E(T(x)) < + 6. 
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Proof. Construct T and S as follows. Define a partition of [0, 1] by 
= Q < Qi < ■ ■ ■ < Qb = 1, where Qj = YZ=i Define 

T(x) = min i^k E N : for some 1 < j < b, Qj-i < < Q 3 ; — ^ ^ , 

r(x) j 
5*(x) = the unique 1 < j < b for which Qj~x < 2_. 7^ < Qj ~ t( x ) ' 



In words, the idea is to consider x = (£j)i e N a s the binary expansion of a 
number u = x &~ % e [0, 1], and to define the output symbol as that 
\ < 3 <b ioT. which u G (Qj-i, Qj)- Determining the correct j necessitates 
looking at only the first T(x) bits in the binary expansion of u. So T(x), S(x) 
are defined for all x which are not the binary expansions of any of the Qj, 
and property (i) in the definition of a simulation is clearly satisfied. Also, 
since, under the measure P, u is uniformly distributed in [0, 1], we have that 



so property (ii) is also satisfied, and (T, S) is indeed a simulation of q from 
independent unbiased bits. To prove the additional properties claimed in the 
theorem, note that 




i=i 



P(S(x) = j) = Lebesgue measure of (Qj-x,Qj) = q(j), 




establishing Efi)- ForEtn), let for < j < b 



oo 



Qj = J2 a i> i2 ~ i > Kie{0,l},ieN) 




Then, checking the definitions we see that, for I > rrij, 

{x : S(x) = j, T(x) = 1} 

= {x : (Xi, . . . , Xl) = (%_!,!, . . . , 1), aj-i : i = 

U {x : (xi, . . . ,xi) = (a jtl , a^_i, 0), a jti = 



0} 
1} 



(1) 
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while for / < rrij, this event is empty. Since 

b oo 

E(T(x)) = J2Y, 1 - P ( T (*) = Z ' S ( x ) = & 

3=1 1=1 

Efii) will follow if we prove that for all 1 < j < b, 
Indeed, by using (JTJ) we see that the representation 

oo 

q(j) = P(S(x) =j)=J2 P ^ = i> T ^ = Z ) 

l=m,j 

can be rewritten as a representation of q(j) as a sum of negative powers of 
2, namely 

oo 

q(j) = J2 2 ~ nj '^ 
i=i 

where each summand 2~ nj ' i is the probability of one of the two events on the 
right-hand side of Arrange the n^j such that Tiji < rij t2 < rij^ < and 
note also that rijj < nj ji+2 for all i 6 N, since any given power of 2 appears 
at most twice in the sum. Then in particular we get 

2"%m < q (j) < 2~ n ^ 1 + 2~ n i> 1 + 2~ n J' 1 ~ 1 + 2~ n J' 1 ~ 1 + 2~ n J- 1 ~ 2 + ... 
= 2 ^1 + ~ + ~ + . . . ) 2~ nj < 1 = 2 2_nj - 

so rij i < — Xo f^2 + 2. This then gives 



J2l-P(T(x) = l,S(x)=j) = ^//,.,2 "- 

1=1 00 00 i=l 



i=i i=i 

< ^-^^ + 2^2-^ + 2-^-2^^ 
V log 2 / . =1 fc=Q 



g(j)logg(j) 
log 2 

as required. □ 
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Remarks. Knuth and Yao ^3] proved Theorem |3] with the better constant 
2 replacing 6 in Theorem E^ii). Their construction is described in slightly 
less concrete terms than the one above, but it is optimal, in the sense that 
the constant 2 is sharp, and in the strong sense that the stopping time is 
stochastically dominated by the stopping time of any possible simulation of 
q using independent unbiased bits. For more information see Section 5.12 of 

&■ 

A generalization of the construction given above to simulation of q from 
a source of independent samples of an arbitrary probability vector p was 
given by Han and Hoshi [3] and independently by Romik jT^] , both of whom 
proved a statement analogous to Theorem Efh), with the base-2 entropy of 
q replaced by the ratio h(q)/h(p), and with the constant 6 replaced by some 
function of the vector p. 

2.2 Simulating independent unbiased random bits from 
a block with an excluded pattern 

For a G N, let A = {i G Z : 1 < i < a} be a finite alphabet, and let 
p = (p(i))ieA be a probability vector. For each n G N, let (A n ,p n ) be the 
discrete elementary probability space of A-valued n-tuples, with the i.i.d. 
product measure with marginal probabilities p. In the case a = 2 of a 
binary distribution, Elias [3] constructed a function that, given an A n - valued, 
p n -distributed input, simulates a random number of independent unbiased 
random bits; that is, a pair (N, F), where N is an N- valued random variable, 
and for each k G N, given N = k, the random vector F is distributed 
uniformly on the set {0, l} fc . 

We shall need a generalization of Elias's construction, which takes as input 
n independent samples from a general discrete distribution, conditioned on 
the non-appearance of a certain pattern, and returns a random number of 
independent unbiased random bits. 

For any t G N, let E n>t be the subset of A n consisting of all vectors 
x = (xi, . . . , x n ) G A n for which for no i G {1, . . . , n — i} is it true that 

Xi = 2, Xj+i = = . . . = Xi+t-1 — 1- 

That is, E n> t contains all vectors which, considered as words, do not con- 
tain the pattern "2 followed by t — 1 IV. We sometimes call such vec- 
tors "pattern-free". Let p n;t be the measure p n conditioned on E Uit . Let 
{0, 1}* = U^ {0, l} fc be the set of finite strings over the alphabet {0, 1}. 
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Theorem 4 For any n,ieN there exist functions 




{0,1,2,...}, 

{0,1}*, 
{1,2,. ..,71- 




£/ie properties: 

(i) Under the measure p n j, for each k G {0, 1,2,.. .} for which 
Pn,t{N nt t(x) = k) > 0, conditioned on N n> t(x) = k, the random vector 
F n> t(x) is uniformly distributed on {0, l} k . 

(ii) The function x — > (N n>t (x), F n>t (x), G n>t (x)) is injective. 
(Hi) For any x G E n>t we have N n j(x) < (loga/log2)n. 

Before going on with the proof, we explain briefly the idea behind this 
construction and its importance in what follows. The functions N n>t , F n j, G n ± 
accept as input a ^^-distributed random variable and produce a binary 
string, F nt t(x), of length N n j(x). Conditioned on the number of bits, the 
binary string is distributed uniformly over all binary strings of that length, 
in other words contains N n ,t{x) independent unbiased bits. We would like 
to ensure that the construction is efficient, i.e., extracts enough information 
from the input; this is guaranteed by claim (ii), which states that adding 
the complementary information G n> t{x) makes the function injective, and 
so enables reconstruction of the input, together with the statement that the 
range of G n)t (x) is relatively small, so the amount of entropy contained in it is 
limited. As for claim (iii), it will be used in our proof that the homomorphism 
we construct has exponential tails. 

Proof of Theorem 0J Throughout the proof, for convenience we shall 
consider n and t as fixed and in most places omit reference to the dependence 
of the various quantities on them. To construct the functions N H} t, F n> t, G Ht t, 
we first divide E U)t into classes of equiprobable elements. We do this as 
follows. Let 

C = {m = (mi, rri2, . . . , m a ) 6 Z" : m 8 > 0, 1 < i < a, m\ + . . . + m a = n}. 
For x = (xi, . . . , x n ) G E n>t and 1 < i < a, let 



Ci(x) = #{1 < j < n : Xj = i}, 
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and let 

count (x) = (ci(x), . . . , c a (x)) G C. 
Then clearly E n>t can be written as the disjoint union 

E n , t = (J{x E E n>t : count (x) = m} =: [J D m . 

meC m£C 

For each m G C, we have for all x G D m that 

p n (x) = p(l) mi p(2) ma ...p(a) ma , 

so 

„ ,_ p(ir^(2)^...p(g) wt " 

In other words, all elements of _D m are equiprobable under p n t . Now, for 
each m G C, let rf m = |-D m |, and write 

d m = Y^ 2 rm '% > r m>2 > . . . > r m>Sm > 0, 

i=l 

for the binary expansion of d m . The functions N Hjt , F n>t and G n) t may now 
be defined as follows. For each m G C, arrange the elements of D m in 
lexicographical order, and for each x G D m denote by rank(x) the position 
of x in this order. Set for each x G D m 

N n , t (x) = r m>k * {x) , k*(x) = min jl < k < s m : 2 r ™- ! > rank(rr) j , 
Fn,t{ x ) — the length- N n>t (x) binary expansion of the number 

k*(x) 

y~] 2 rm - ! - rank(x), 
i=i 

Gn,t( x ) — the position of m in the lexicographical order on C. 

The functions N n j, F n j, G n ,t are clearly defined on all E n>t and have the 
desired range. In words, we have used the lexicographical order to give an 
explicit partition of D m into subsets of sizes 2 rm -% 1 < % < s m . On each 
subset of size 2 Tm ' i we define N n>t = r m i , and for the value of F njt assign to 
the 2 Tm ' i possible elements the 2 rm - i different binary strings of length r m ^. 
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This implies claim EJi), since the elements of D m are equiprobable. The 
function G n<t is defined so as to encode the residual information needed to 
recover the value of x given F njt (x). Indeed, if x,x' G E n j and x ^ x', 
then either x G D m , x G D m i for some m 7^ m', in which case clearly 
Gn,t{x) 7^ Gn,t{%')i or x,x' are in the same D m but rank(x) 7^ rank(x'), 
whence F n t (a;) 7^ F ntt (x'). This proves claim EJn). Finally, claim EJ^iii) is 
immediate, since for all m G C, 1 < i < s m we have 2 rm ' i < d m < \E njt \ < a n , 
so N nit (x) = r mtk *( x ) < (loga/log2)n. □ 

3 Construction of the homomorphism 

We now construct the homomorphism ip that will be used to prove Theorem 
1. We call the sequence (xi)igz the input sequence, and the resulting se- 
quence ((p(x)i)i£z the output sequence. For convenience, we fix a source 
probability vector p = (p(i))i<=A, which we assume has full support and sat- 
isfies h{p) > h(q) +£, and denote P = P p . We may also assume without loss 
of generality that ^ (o , (^))igB- For an alphabet A, denote by A* = U n >oA" 
the set of finite words over A. For each w G A*, denote by length(w) the 
length of w. 

The construction is done in several stages. Here is an informal description 
of the steps, which are also drawn schematically in Figure 1. 

Step 0: Fix a parameter t G N, the marker length. Its value will be 
some large integer that will be determined later, and will only depend on the 
target distribution q and the entropy gap bound e. 

Step 1: Divide the input sequence into blocks. A marker is an index 
i for which 

Xi = 2, Xi+i = Xi + 2 = . . . = Xi + t-\ = 1. 

Enumerate the markers as . . . , R-2, R-i, Ro, Ri, ■ ■ where Ri is the first 
marker to the right of the origin. A block is the set of indices between two 
markers, namely {i : Rk < i < -R^+i}. The input word associated with 
block k is the sequence Wk = (xi) R k +t<i<n k+1 -i, namely the sequence of 
input symbols in block k, not including the "211. . . 1" patterns. 

Under the measure P, the input words . . . , W_i, W , W±, . . . are indepen- 
dent A*-valued random variables. The words (Wk)ke%\{o} are identically dis- 
tributed. (Note that Wo has a different distribution owing to "size-biasing"). 
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Division into blocks 
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Computation of the associated bit strings 
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Step (3,0): The simulators are running 
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0010111 

x 
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... Step (3,8) ... 
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X 
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Assignment of the B-symbols to the output locations 



3255245553 3451 44355521 455251535443551 33 . . . 



x 



X X 



X 



X 



t ■ 

r 



running 



completed simulation 



=queued up 



dispensing output symbols 



Figure 1: Illustration of (p 
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All the input words have the property that, conditioned on the length of Wk 
being equal to n, Wk has distribution p n t . 

Step 2: Apply to each input word Wk the function F n)t from Section 2.2, 
where n is the length of Wk, to obtain a string Uk = F ntt {Wk) of N nyt {Wk) 
independent unbiased random bits. Uk is called the bit string associated 
with block k. 

Step 3: For each block k e Z, attempt to use the random bits in Uk to 
simulate a B iifc+1- ' Rfc -valued random variable Bk with distribution q R w R k^ 
using the simulation (T,S) from Section 2.1. In many blocks, the stopping 
time will be reached. If the stopping time is reached, Uk may contain unused 
bits, which are still independent and unbiased. For any block k whose stop- 
ping time is not reached, look at Uk+i in the next block to the right to find 
unused bits to continue the simulation. If now the stopping time is reached, 
compute Bk- If not, iterate, looking one block further to the right at each 
step for unused bits, until the stopping time is reached. This iteration is done 
simultaneously for all blocks, in order to maintain translation-equivariance of 
the construction. A further complication arises because some bit strings may 
have length zero, so two or more simulators may try to read the same bits at 
the same time. In such situations we give priority to the simulator belonging 
to the rightmost block. We will refer to such a situation as a queue-up. 

The ergodic theorem will ensure that, for a proper choice of the parameter 
t, a.s. enough bits are present overall to complete the process for all fceZ. 
Having computed Bk, the B-symbols it contains are assigned to the indices 
of the fc-th block, to produce the output sequence <p(x). 

Each block length has exponential tails. For each fceZ, the number of 
blocks which must be examined in Step 3 to simulate Bk has exponential 
tails. It follows that this homomorphism has exponential tails. 

Formal definition of (p 

Let x = (xj)iez- Let t, the marker length, be a positive integer to be 
chosen later. 

We first define the marker locations, (Rk)kez- Let 



i?i = min{i > : Xi = 2, x i+1 = x i+2 



x i+t -i = 1}. 



Inductively, for k > 2, let 



R k = mm{i > R k -i : Xi = 2, x i+1 = x i+2 



Xi+t-l = 1}. 
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Let 

R = max{i < : x { = 2, x i+1 = x i+2 = ... = Zj+t-i = 1}. 
Inductively, for k > 1, let 

= max{i < -R^ fc+ i : = 2, x m = x i+2 = . . . = x i+t _i = 1}. 

It follows from well-known facts of elementary probability theory that (Rk)kez 
are defined for P-almost every input sequence (xi)iez- For all fc G Z, let 
{« : i? fc < i < -Rfc+i} be the fc-th block. Let W fe = (xi)ij fc+t <i<H fc+1 _i be the 
input word associated with block k, an A*-valued random variable, and let 
L k = length(W / fc ) = R k +i — Rk — t be its length. Let A fc = R k +i — R k - Note 
that by definition, Wk does not contain the pattern "2 followed by t — 1 l's". 

Let N n>u F nt t, G n>t be as in Section 2.2. For all k e Z, let U k = F Lk)t (W k ). 
Uk is a {0, l}*-valued random variable, called the bit string associated with 
block k. Denote its length by V k = length(?7 fc ) = N Lk:t (W k ). Let U k = 
(e fc (l), e fc (2), . . . , e k (V k )) be the bits comprising U k . 

For any £ G N, let the pair (T^, St) be a simulation of the distribution 
q e from independent unbiased bits, as in Section 2.1. (Recall that Tg is the 
stopping time and Sg is the output symbol of the simulation). For an input 
x G {0, 1}*, say that the simulation (T e ,Se) is successful for input x if 
for some y G {0, 1} N (and hence also, by the definition of a simulation, for all 
y G {0, 1} N ) we have Tg(x * y) < length(x), where x * y is the concatenation 
of x by y. 

In Step 3, for each k G Z we will generate a B*- valued random variable 
Bk = (Ac(l), • • • ,/9fc(Afc)) such that, conditioned on {\k)k&, the B k are inde- 
pendent and each B k has distribution q Xk . To do this, in each Step (3,n) for 
n > 0, to each block fceZwe assign the following: A pair ( J£, Mfi) with 
J k > k and 1 < M' r k l < Vj«, called the position of the k-th simulator 
at Step (3, n) (here "position {j,k)" refers to the kth bit of block j); a 
word G {0, 1}* called the input read by the fc-th simulator by Step 
(3, n); and a set Q k of pairs (j,m) 6 Zxl such that is the concate- 
nation of all the bits €j(m), (j,m) G Q k , arranged in lexicographical order 
on (j,m), called the set of positions used by the fc-th simulator by 
Step (3, n). If for input Z%, the simulation (T\ k ,S\ k ) is successful, then 
let Bk = S\ k {Z^ * y) for some (and hence all) y G {0, 1} N and say that Bk 
was computed by step (3, n). For a pair (j, m), with j G Z, Vj > and 
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1 < m < Vj, denote 



NEXT(j, m) 



(j, m + 1) if m < Vj, 

(min{j' > j : Vf > 0}, 1) if m — Vj. 



the next bit position after (jr, m) (which is a random variable). 

Step (3,0): For all k G Z, set Q\ = (the empty set), Z% = <p (the 
empty string). Set J° = min { j > k : Vj > 0} and M° = 1. (It follows easily 
from Lemmas El and [7| below that J° are a.s. defined and finite.) No -B/t's are 
computed. 

Step (3,n): For each fc e Z, if Bk was computed by time n — 1, set 
(J£,M«) = (J*- 1 ,^- 1 ), = ^ and Z£ = Z,- 1 . Otherwise, check if 
position (J£ ,M% x ) was used by some simulator by Step (3, n — 1), i.e. 
whether (J^M™" 1 ) is in U^^^V 1 . If yes: set Q n k = Q n k ~\ 2£ = Z%~\ 
and (J£, M") = NEXT(J£ _1 , M^ 1 ). 

If position ( J£ , M£ x ) was not used by any simulator by Step (3, n — 1): 
Check if for some k' > k we have (J™" 1 ,]^" 1 ) = (J£,"\ M^," 1 ), a phe- 
nomenon we refer to as a queue-up of the fc-th simulator. If there is a 
queue-up, set 0£ = Z£ = and (J fc n ,M£) = NEXT( J™" 1 , M™" 1 ). 

If the fc-th simulator is not queued up: Set (% = Ql' 1 U {( J™" 1 , M™" 1 )} 
and Zl = Zl~ l * e r -i(Ml~ l ). If now the simulation (T\ k ,S\ k )is successful 

for input Z%, set B k = S\ k {Z k * y) for some (and hence all) y G {0, 1} N , 
set {J k ,M%) = (J^M™ -1 ), and say that B k was computed at Step (3,n). 
Otherwise, set (J£,M«) = NEXT( J%~\ M™" 1 ). 

We will show later that, if the marker length t is chosen large enough, 
then for P-almost every input sequence (xj)iez, for all k G Z there exists an 
n > 1 for which 5^ was computed by Step (3, n). So all the -EVs are a.s. 
defined B*-valued random variables. 

Note that it is immediate from the definition that length(Sjt) — X k — 
R k+ i - Rk- Let B k = ((3 k (i),P k (2), . . . , P k (X k )) be the B-symbols comprising 
Bk- For each i G Z, we define (y»(a;))j as follows. Let K(i) G Z be the index 
of the block containing i, namely the unique k G Z for which i?^ < i < R k +i, 
and set 

{<p(x))i = (3 K (i){i ~ Rk(i))- 



This completes the formal definition of (p. Figure 1 shows a schematic 
illustration of the construction. Table 1 summarizes our main notation for 
convenient reference. 
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Table 1: Summary of notation 



Symbol 


Meaning 


A* 




finite words over A 


(T e , S e ) 




simulation of q e from independent unbiased bits 






simulation of independent unbiased bits from 
pattern-free block 


7T 




"forbidden pattern" (2, 1, 1, . . . , 1) (t — 1 ones) 


E n ,t 




A-n-tuples not containing 7r 


x = (x i ) ieZ 




input sequence 


A = {j : 1 < j 


< a} 


source alphabet 


B = {j : 1 < j 


<b} 


target alphabet 


P = (P(j))l<i<a 




source distribution 


Q = (QU))i<j<b 




target distribution 


e 




lower bound on the entropy gap h{p) — h(q) 


Pn,t 




p n conditioned on E n>t 


t 




marker length 






k-th marker 


Afc = Rk+i — R 


k 


length of k-th block 


W k - (Xi) Rk+t <i<R k+1 -l 


k-th input word 


Lk — Afc — t 




length of Wk 


U k = (e k (l),... 


Mv k )) 


bit string associated with block k 


v k 




length of U k 






position of the k-th simulator at Step (3, n) 


^k 




input read by the k-th simulator by Step (3, n) 


<3 n k 




positions used by the k-th simulator by Step (3, n) 


NEXT (j,m) 




next bit position after (j, m) 


B k = (/3 fc (l), • • 


■ ,Pk(\k)) 


B*-valued r.v. computed by the k-th simulator 


K(i) 




index of block containing % 


ip(x) = ((p(x)i) t 


el 


output sequence 
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4 Proof of Theorem 1 



Lemma 5 The A* -valued random variables (Wk)kez are independent. For 
each k G Z and n G {0, 1,2, . . .}, conditioned on {length(Wk) = n} has 
distribution pnj- The non-central block lengths (R k +i — Rk)kez\{o} are iden- 
tically distributed with exponential tails, and the central block length R\ — Rq 
has exponential tails. 

Proof. Let fio,f-i be the measures on A* defined as follows. 
^o({(oi,a 2 , . . . ,a n )}) 

{n + t)p{2fp^? {t - 1] ]T 3 =iP{^), (a 1 ,...,a n )EE nit 
otherwise 

fi 1 ({(a 1 ,a 2 , . . . ,a n )}) 

Oi\ , . . . , Qj n 



otherwise 
We claim that for any j > and {wk)-j<k<j C A*, 

P ( ( W k)-j<k<j = {w k )-j<k<j J = Ho{w ) J~J Pl(w k ). 
K ' ' -j<k<j, k^O 



(2) 



This will prove that (Wk)kez are independent, with W having distribution 
/i and all the other W^s having distribution \i x - with both /Vs clearly 
having the desired property that conditioning on the length n gives p n>t . 
(Incidentally, it will also prove that fio, [i\ are probability measures, although 
this can be checked directly.) Indeed, to prove (J2J), let ir = 211 ... 1 be the 
word "2 followed by t — 1 l's", and let w be the word obtained by the 
concatenation 

W = TC * W~j * TC * W-j+l * . . . * TC * Wj * TC. 

Denote len + = J2i<k<j length (w k ), len~ = J2-j<k<o length(w fc ). Then, if all 
the WkS do not contain the pattern tt, we have 

(Wk)-j<k<j = ( w k)-j<k<j 

lcngth(u>o)+t— 1 

= U ]w2T^-=»^ P) 

r=0 
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and furthermore the above union is disjoint. (In words, this simply means 
that {Wk)-j<k<j — ( w k)-j<k<j if and only if for some r, a string of Xj's 
centered around the origin such that the offset of the 0-th block relative 
to the origin is equal to r, is equal to w. This follows directly from the 
definitions.) Therefore 

(v Length(iuo)+i-l , . 
(W k )_^ = (W^^j = E P ((^)^tlcn- = W ) 

= (length^) + t) ■ p le ^ w \w) = Li (w ) J] fr(w k ). 

-j<k<j, MO 

If some Wk contains tt, the event on the left hand side of Q is empty, and 
((21) holds trivially. 

It remains to prove that the block lengths R k +i — R k have exponential 
tails, or equivalently to prove the same for L k = length (W*;) = R k+ i — R k —t. 
Denote c = p*(7r) = p(2)p(l) t ~ 1 . Then for k ^ 0, 



cP( fxi)™^ 1 does not contain it 



<cP((x,)g-Vvr, J = 1,2,... 



n — t 



c(l - c 



L("-*)/*J 



which decays exponentially in n. Similarly, since the distribution [i is at 
most a factor O(n) times /ii, L = length(Wo) also has exponential tails. □ 



Lemma 6 The associated bit strings (U k ) k ez are independent {0, l}*-valued 
random variables. For each k 6 Z and n G {0, 1,2, . . .}, [/& conditioned on 
{V k = n} has the uniform distribution on {0, l} n . (£4)fcez\{o} are identically 
distributed. (V k ) k€ z have exponential tails. 

Proof. It is immediate from Lemma 03 that (W k ) k( zz are independent 
and {W k ) kG z\{o} are identically distributed, therefore the same holds for 
the sequence (U k ), as required. Now, for any k e Z, V k = NL k) t(W k ) < 
(log a/ log 2) by Theorem Efiii), so it has exponential tails by Lemma O 
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Finally, let k G Z and ra G {0, 1,2,.. .}. Then by Theorem 0] and Lemma 
for any w G {0, 1}™, 



P U k 



w 



V k 



n 



^P(L fc = m 

m ^ 



V k = n)P( / , = 

V k = n 



V k = n, L k = m 



xP[F m>t (W k )=w 



length(W fe ) = m, N m>t {W k ) = n 



m 



V k = n\ -2- n 



□ 

Lemma 7 The marker length t may be chosen so that for k ^ 0, 

E(V k ) > E(T Al ). (4) 

Proof. Consider temporarily a new probability measure P, with expec- 
tation operator E, under which Wo has the same distribution as W\ and is 
independent of all other random variables, while all other random variables 
have the same distribution as before. By Lemma El and the computation in 
its proof, the process (W k ) ke z is now isomorphic to the induced dynamical 
system B{p)\ , where 



M = {x <E A : (^)-=o = tt}. 

(Recall that tc = (2, 1, 1) is the forbidden pattern.) By Abramov's formula 
([HI, p. 257-259), this dynamical system has entropy h{W\) = h{p) / 'p f (n) . 
This is also equal to /i(p)E(Ai) = /i(p)E(Ai), since by Kac's formula ([IS], p. 
46), the expected return time (or expected block length) is the reciprocal of 
the probability of the inducing set. 

By Theorem H^ii), the mapping W\ \— > (Ui,Li,Gi, u t(Wi)) is injective. 
Therefore, using Lemma 5 and elementary properties of the entropy function, 

%)E(A X ) = h(Wx) = h(U h La, G Ll , t (Wx)) < h{U x ) + h(L h G Lut (Wx)) 
= h(Ux\Vx)+h(Vx)+h(Lx,G Lut (Wx)) 
= E(K) log 2 + h{Vx) + h(L u G Ll AWx)). 
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By Theorem El we have 



E(T Al )log2 < 61og2 + E(Ai)%) < 61og2 + E(Ai)(%) - e). 
Combining the last two inequalities gives that 

F(V) F(T)> £ F(X) fi KV^ + KLuGl^)) 
E(V0 - E(T Al ) > j^ECAx) - 6 — . 

To bound the negative terms on the right-hand side, recall the following 
properties of the entropy of integer- valued random variables (see pQ , Lemma 
12.10.2): If X is a random variable with finite expectation that takes values 
in N, then h(X) < h(Y), where Y has a geometric distribution with E(Y) = 
E(X). Furthermore, h(Y) = 0(logE(y)) when the expectation is large. 
This implies, using the fact that V\ < (log a/ log2)Li, that for some positive 
constant C (that depends on a), 

h(Li) < ClogE(Ai), 

h{V x ) < ClogE(A 1 ), 

h{L u G Ll , t {W x )) = h(Lx) + h(G Lut (Wi) \ L x ) 

< ClogE(A 1 ) + (a-l)logE(A 1 ), 

(because the range of G n ,t is {1,2,..., n a ~ 1 }). Therefore 

E(V 1 )-E(T Al )>/(E(A 1 )) 

for some function 

f(u) = fe(u) = ~U - 6 - C'logU, 

log 2 

where C > is a constant that depends only on a. Now, f(u) — > oo as u — >• 
oo. Choosing the marker length t sufficiently high will force E(Ai) = l/p*(7r) 
to be large, uniformly over all probability vectors p under consideration, i.e. 
that satisfy h(p) > h(q) + e and p(i) > for all i G A (note that p(l) is 
bounded away from 1 because the alphabet size is fixed and h(p) is bounded 
away from 0). So for sufficiently large t we get E(Vj.) — E(T Al ) > 0, proving 
the lemma. □ 

From now on we consider t as having a fixed value for which (J3| holds. 
Note that in particular it follows from Lemma [3 that almost surely, Vk > 
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for infinitely many positive values of k. Therefore (J%,M%) in Steps (3,0), 
(3, n) are a.s. defined. 

Recall the notion of stochastic domination. Let A be a compact metric 
space on which there is defined a partial order and assume that ^ is 
a closed subset of A x A. If X, Y are two random variables (not necessarily 
defined on the same probability space) taking values in A, denote X ^ stoc Y 
(read: U X is stochastically dominated by V") if for any e G A we have 

P(AT y e) < P(Y h e). 

It is known (see [TTJ Th. 2.4, p. 71]) that, under the above topological 
assumptions, X ^ s toc Y if and only if there exist random variables X', Y', 
defined on the same probability space, such that the variable X' has the same 
distribution as X, the variable Y' has the same distribution as Y, and 

P(X' r< Y') = 1. 

On the set {0, 1} # = {0, 1}* U {0, 1} N of all finite and infinite bit strings, 
let w ^ w' denote the order "u> is a prefix of w'" . Equip {0,1}* with 
the topology consisting of open sets of the form [w 1 G {0, 1} # : w z< w'\ for 
w G {0, 1}*. It is not difficult to verify that {0, 1} # with this topology is a 
compact metric space, and that ^ is a closed subset of {0,1}* x {0,1} # . 

On the set ({0, 1}*) Z of Z-indexed vectors of finite and infinite bit strings, 
let ^ z denote the (strong) product order of ^, i.e., we define 

{w k )k& ^ {w' k )kez w k ^ w' k for all k G Z. 

For each i G N, let Zi be the following {0, l}*-valued random variable: 
take independent unbiased bits z/ 1; u 2 , u 3 , . . ., and set 

where Ti is the stopping time of the simulation of q from independent un- 
biased bits as in Section 3. We call Zi (an instance of) an acceptable 
input for the simulation (Tg, Se). Let (^£,fc)^eN,fcez be an infinite array of 
independent random variables, where Zi^ has the same distribution as Z(_. 
Denote = length (Z^). Clearly is equal in law to T^(z/ 1; z/ 2 , . . .). 

Lemma 8 For each n = 0, 1, 2, . . . we have 
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(i) Z^ is the concatenation of the bits €j(m) for all (j, m) G 0%, arranged 
by lexicographical order on (j, m) . 

(ii) (G%)kez are disjoint sets. 

(Hi) The conditional distribution of (Z%)k£Z given (R k )kez — ( r k)kei is 
a.s. stochastically dominated, in the order by (Z rk+1 ^ rk)k ) k£ z- 

(iv) The conditional distribution of (length(Z%)) k€ z given (R k ) k <=z = 
( r k)k€Z is a.s. stochastically dominated, in the order < z (the product 
order of the usual order on numbers), by (T rk+1 _ rk:k ) ke z. 

Proof. Claim (i) follows trivially by induction on n, as the simulator 
locations ( J k , MJ?) are obviously increasing in the lexicographical order. 

Claim (ii) follows by induction on n, by noting that Q% is always obtained 
from G k ~ l by the addition of at most one location (j, m), and, since we made 
allowance for the phenomenon of queue-ups, where multiple simulators are 
at the same location during Step (3, n), any given location (j, m) is added to 
G k for at most one value of k G Z, keeping the Q™s disjoint. 

It remains to prove Claim (iii), which implies (iv) trivially. To do this, let 
(0k,j)kez,je® be an array of independent unbiased bits which are independent 
of all other random variables. For each k G Z define X k = Z% * (Ok,j)j>i- 
Because of Lemma El together with claims (i) and (ii) proven above, it fol- 
lows that, conditional on (R k ) k <=z = ( r k)kez, we have that (X k ) ke z is a se- 
quence of independent infinite sequences of independent unbiased bits. Set 
£fc = { x k,j)i<j<T rk+1 -r k {x k )- Then (still working conditionally) (£ fe ) fee z are in- 
dependent and for each k G Z, has the distribution of Z rk+1 - rk . Also, from 
the construction necessarily Z% ^ We have constructed a realization of 
(Z rk+1 _ rk)k ) ke z that dominates {Z^) ke i, thereby proving the stochastic dom- 
ination claim. □ 

We shall use the following mass-transport lemma: 

Lemma 9 Let f : Z x Z — > R satisfy f(x + c,y + c) = f(x,y) for all 
x,y,c G Z. Then for all x G Z ; 

^f(x,y) = ^2f(y,x). 

y&Z yeZ 

Proof. Taking c = x — y gives f(x, y) = f{2x — y,x), so 

X f( x > y) = ^2 f( 2x ~ v> x ) = z~2 f( u > x )- 

j/GZ y&L u&L 
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□ 



Lemma 10 Almost surely, for all k e Z there exists an n > 1 for which Bk 
was computed by Step (3, n). 

Proof. As in the proof of Lemma we introduce the probability measure 
P with expectation operator E under which (Wk)kez are i.i.d. Since P is 
absolutely continuous with respect to P (see the proof of Lemma EJ), it is 
enough to prove that each Bk is eventually computed, P-a.s. 

For each fceZ, define Z^ = lim^oo Z% (a possibly infinite bit sequence) 
and G%> = lim™ 0£ = U~ =1 0£. Define / : Z x Z -> R by 

f(k,j) = Ef\{{j,i):iez, i<i<y;-}ngg°l) 

= E (number of bits from Uj read by the k-th simulator) . 

The function / satisfies the assumption of Lemma 03 since (Wk)kez * s a 
stationary sequence under P, and it is easy to see from the construction that 
(Uk, Vfc, Z™, Q^k&L are all generated from (Wk)kez m a shift-equivariant 
manner. Therefore for all k G Z, we have 

£/(*'.?') = E/0",*)- (5) 

Denote the quantity in (J5J) by (7 (by stationarity it does not depend on k). 
The left-hand side of (0) is equal to E(|C/£°|), the expected number of bits 
eventually read by the fc-th simulator. By Lemma Efiv), g < E(TxJ. The 
right-hand side is equal to the total number of bits from the k-th associated 
bit string Uk eventually used by any simulator, and clearly cannot exceed 
E(Vfe) = E(Vi). In particular, this implies that g < 00, so almost surely Z^> 
is a finite string. 

Assume that with positive P-probability, B k is not computed by Step 
(3, n) for any n. Since Z%° is finite, the only way for this to happen is for the 
k-th simulator to eventually fail to find any unused bits in the blocks to its 
right. Because of stationarity, by the ergodic theorem this implies that a.s., 
this happened to a positive proportion of the simulators. Therefore, a.s. all 
the bits are eventually used! In other words, there is the equality g = E(Vi). 
We have shown E(Vf) = g < E(TxJ, in contradiction to Lemma So the 
assumption that Bk was not computed with positive probability is false. □ 
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Lemma 11 Conditioned on (Rk)kez — ( r k)kez, the random variable (B k ) k< z%, 
are independent, and for each k eZ, B k has distribution q r k+i~ r k_ 

PROOF. By Lemma |HJ conditioned on (Rk)keZ — { r k)k&i the random 
vector (Z£°) fcgZ is stochastically dominated in the product prefix order by 
(Z rk+1 _ rk)k ) ke z- Assume that these two sequences are defined on the same 
space and that there is actual a.s. domination. For each k e Z, both Z™ 
and Z rk+1 _ rk ^ k are acceptable inputs for the simulation (T rfc+1 _ rfc , S rk+1 - rk ). 
So, since ^ Z rk+1 _ rktk , necessarily Z%° = Z rk+1 _ rktk . We have shown that 
(Z%°) keZ = (Z rk+1 _ rk)k ) k£Z . Therefore (still working conditionally) {Z^) k&L 
are independent and each Z™ has the distribution of an acceptable input 
for the simulation (T rfe+1 _ rfc , SV fe+1 _ r J. Therefore, since B k = S rh+1 ^ rk (Z%°), 
(B k ) k( zi are independent, and for each k e Z, B k has distribution q r k+i~ r k_ 
□ 

Lemma 12 {{<p{x))i)iez are i.i.d. with distribution q . 

Proof. The mapping i —> (K(i),i — RkU)) is obviously injective. This 
means that the (<p(x))i = fiKU){i— RkH)) get assigned different f3 k {j) symbols. 
Conditioned on (R k ) ke .i, these symbols are all independent B-symbols with 
distribution q. This immediately implies the claim of the lemma. □ 

Lemma 13 (p is translation- equivariant. 

Proof. Let x G A z . Denote x' = T(x). Our goal is to prove that for all 
i G Z, (<f(x'))i = ((p(x)) i+1 . 

If X is any of the various quantities in Table 1 which are implicitly de- 
pendent on x, denote by X' the corresponding quantity taken as a function 
of x' rather than x. Consider separately two cases: 

Case 1: Ri > 0. In this case, it is easy to check directly that for all 

fcez, 

# k = R k -l, W' k = W k , L> k = L k , U' k = U kl V' k = V k 

(the markers are shifted by one). By induction on n, for all k G Z and all 

n > 0, 

((^)',(M fc ")') = (J fc ",M™), {ZZ)' = 2%, {G n k )' = G n k . 
Therefore, for all k G Z, B' k = B k . Furthermore, K(i)' = K(i+ 1). Therefore 

{<f(x% = P' m ,(i - R' K(iy ) = P K (i+l)(l ~ (RK(i)+l ~ 1)) = 



23 



Case 2: Ri = 0. In this case, check that for all fc G Z, 
R'k = Rk+i - 1, W 7 ^ = Wk+i, L' k = L k+ i, U' k = £4 + i, V^' = 

(the markers are shifted by 1, and the indexing of the blocks is shifted by 1). 
Therefore, by induction on n, for all k e Z and for all n > 0, 

Therefore, for all 6 Z, 5[ = -Bfe+i- Check as before that now K{i)' = 
K{% + 1) — 1. Therefore 

(^(x'))* = ^ (i y(<-^(0') 

= - (-R(A-(i+l)-l)+l - 1)) = ((p(x)) i+1 . 

□ 

The following facts concerning random variables with exponential tails 
will be useful. 
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(i) If X,Y are real-valued random variables with exponential tails, then 
X + Y has exponential tails. 

(ii) If Xi,X 2 , . . . are random variables with uniformly exponential tails 
(i.e., (sup fc P(|X fe | > n)) neN decays exponentially), and T is an N- 
valued random variable with exponential tails, then the random variable 
Xt has exponential tails. 

(Hi) If X±,X 2 , . . . are i.i.d. random variables with exponential tails and 
mean \i, then, for any c > 0, the sequence 



( P ( Y^Xi-nfi >nc^ 



decays exponentially. 

(iv) Suppose Xi,X 2 ,... are i.i.d. random variables with exponential 
tails, and T is an N-valued random variable with exponential tails. De- 
note S m = Y^k=iXu- Then St '■= Ylk=i-^k has exponential tails. 
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Proof. Proof of (i): we have P(|X + Y\ > n) < P(\X\ > n/2) + P(|F| > 
n/2), which decays exponentially. 
Proof of (ii): we have 



P 



(\X T \>n) < P(T>n) + P ( \J{\X k \ > n} 



Vfc=l 



< P(T > n) + n sup P(\X k \ > n) 

Kk<n 



which decays exponentially. 

Claim (iii) is a standard fact from large deviation theory; see for example 
(0 Corollary 27.4). 

Proof of (iv): Let c > and < d < 1 be such that P(T > n) < c ■ d n 
for all n. Denote a = 1/(2B\X 1 \). Then 

oo oo / m \ 

P{\S T \>n) = ^P(T = m, \S m \>n) <Y,P[T = m,J2\X k \>nj 

m=l m=l k=l ' 

[a-n\ . [a-n\ . oo , 

< ^2 p(T = m, ^ \X k \ >nj+ ^ P\T = m 

m=\ ^ k=l ' m=[a-n\+l ^ 

f L a -«J 

< en ■ P I 



/ \ / \ 

yY.\X k \-\a-n\P,\X x \ >-j+P\ [ T>anj 



In the last bound, the second term decays exponentially in n. The first term 
decays exponentially, by (iii) above. □ 

For each k 6 Z, let J£° = lim^oo be the value of J k for that n for 
which B k was computed at Step (3,n); that is, the index of the rightmost 
block used by simulator k. 

Lemma 15 has exponential tails. 

Proof. Let r be that n > for which Bo was computed at Step (3, n). If 
we define Jo = 0, 

h = mm{j > : V 3 > 0}, 
and inductively for A; > 1, 

4 = min{? > 4_! : V,- > 0}, 
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then because of Lemma |Hl it is easy to prove in a manner similar to the 
proof of Lemma that (Ik+i — h)k>o are independent, (Ik+i — h)k>i are 
identically distributed and have the geometric distribution Geom(po), where 
Po = P(Vi > 0) (so in particular have exponential tails), and I\ is stochasti- 
cally dominated by (I 2 — ii) + 1. 
We know that 

T 

J?<Ir = Y,{Ij-Ii-l)- 

3=1 

This is because Jq = I\, and at any Step (3, n), 1 < n < r, we have 
(J n ,M n ) = NEXT(J n - 1 ,M n " 1 ), so by induction, J n < I n+1 . In particu- 

lar Jq —— Jq — — Jq ^ If . 

It follows, by Lemma rUT iv). that it is enough to prove that r has expo- 
nential tails. Let c > 0, 5 > be parameters whose value we will fix shortly. 
Write 



P yr > nj = P yB undefined by Step (3, n) 

= P ( r > n, length(Z n ) <cn) + P[T>n, length^) > cn ) . (6) 



By Lemma IHfiv), the second term is at most P(Ta x > cn). Note that T\ 1 has 
exponential tails, since byEfi), P(7fe > n) < ^r-, whence 

U {T k > n} < P(A a > dn) + — ^— 

k<dn / 

can be seen to decay exponentially by taking d = log 2/ (2 log b). It remains 
to deal with the first term in (El), 



P(E n ) := P ( t > n, length(Z n ) < cn 
Note the event inclusion 



E n cU>n, length^) > £ V k - cn \ 

^ k=l k=l ' 
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since on E n , by Step (3, n) simulator number would have used all the bits 
Cjiyn) for 1 < j < Jq — 1, 1 < m < Vj, which were left unused by the 
simulators numbered 1, . . . , Jq — 1, and there cannot be more than cn such. 

Next, observe that on the event {r > n}, because (Jq, Mq) = 
NEXT"( J °, M °) (the n-th iteration of NEXT), we have the equality 

J " = inf [j>0:^^>n| =: 6 n 

^ i=0 ' 

(we denote the quantity on the right-hand side by 9 n ). So we have shown 
r > n, len S th (^) > Vk ~ cn \ ■ 

k=l k=l ' 



The value of 6 n is, with probability exponentially close to 1, close to (E(Vi)) 1 n. 
More precisely, for any 5 > 

\e n > ((E(^))- 1 + C | Eo<i<(( E (^))- 1+5 )n Vt < n} 

= { Eo< 4 <((E ( v 1 ))- +5 )n(^ - E(V0) < -E(y x ) • 5n| 

has probability which decays exponentially by Lemma Eland Lemma ITlI iii). 
and similarly, the probability of 

k < ((E^))- 1 - S)n\ C |J {E V « >n j 

J 0<j<((E(yi))-!-5)n ^ 0=1 ^ 

decays exponentially. So, summarizing the latest developments, we can now 
write 

E n C - (E(Vi)) _1 n| > 5n 

j 



U 

li-CE^))- 1 !.! 



|J { X; length^) > n-cnj. 



The first event has probability which decays exponentially by the remarks 
above. The second event is a union over linearly many events, each of whose 



27 



probability we can only increase by replacing length(Z^) by Tj., an indepen- 
dent copy of 7\ u using Lemma |H(iv). So it is enough to prove that 

/ 3 3 

max P ( V TL > V V k - cn 

\3-{V{Vi))-in\<6n k^l 

decays exponentially in n. For this, invoking Lemma [71 choose c and 5 
sufficiently small (the following choices will work: 5 = (E(Vl)) _1 /2, c = 
(E(Vi) - E(T Al ))/(3(l + E(Vi))) so that the inclusion 

£?*>X>-<4 C |^(n-E(T Al ))>cj] 
fe=i fe=i ' ^ fc=i 

[ Y,(V k - E(Vi)) < -cjV \j - (E(l/i)) _1 n| < <Jn, 
^ fc=i 

will hold, and use Lemma ITffi iii). □ 

Lemma 16 if is a finitary homomorphism from B(p) to B(q) with exponen- 
tial tails. 

Proof. From Lemmas ED and ED it follows that <p is a homomorphism 
from B(p) to B(q). From the definition ((p(x)) = Pk(o)(—Rk(o)) = flo(-Ro) 
(since K(0) = 0) we see that ((p(x)) is determined from the input symbols 
(%i)Ro<i<Rjoo +1 +t- This proves that tp is finitary, with a coding length N v 
that satisfies 



U 



N v < max(-i? , Rj^+i +t) < Rj^+2 - Ro 

= (-Rj+i - -Rj) + (-Ri - -Ro)- 

i=i 

Since by Lemma El (-Rj+i — Rj)j>o are i-i-d- with exponential tails, and 
Ri — Ro has exponential tails, it follows from Lemma ITffi i) . (iv) and Lemma 
El that i\L has exponential tails. □ 



5 Extension to Markov chains 

We indicate here the changes in the ideas presented above required to prove 
Theorem |21 We omit the details of the proofs, which are similar to those 
above. 
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5.1 Coding from a Markov source 

If the Bernoulli source B{jp) is replaced by a Markov source, two changes 
to the construction are needed. First, the marker pattern "211 .. . 1" must 
be replaced with a sequence which is assigned positive probability by the 
Markov chain. Here, we must give up the source-universality property, or 
make the rather restrictive assumption that all the entries in the matrix a 
are positive. 

Second, it is necessary to replace the function F n j that produces indepen- 
dent unbiased bits from a pattern-free Bernoulli block with a new function, 
F' n t designed to do the same for a pattern-free Markov block conditioned to 
begin with a "1" (which we assume without loss of generality to be the last 
symbol in the left marker) and to end with a "2" (the first symbol in the right 
marker). Note that the construction of F n>t used only the symmetries of the 
distribution p n>t , namely the fact that the space E njt can be partitioned into 
classes of equiprobable elements, since the probability of an element only 
depends on the count of the different A-symbols, and not on the order of 
their appearance. 

For a Markov source, the probability of an element in E n> t will depend 
on the count of adjacent pairs of symbols. Thus, there will again be classes 
of equiprobable elements. The number of classes, which bounds the range 
of the complementary function G' n t (and hence the amount of lost entropy 
- see the proof of Lemma d), is at most n 0,2 . All the proofs carry through 
identically to the Bernoulli case. 

5.2 Coding to a Markov chain 

Things get more complicated when the target process is Markov. Here, it is 
not enough for each block to independently generate the B-symbols required 
to fill its spaces, since one must make sure that there are the correct transition 
probabilities on the boundaries between blocks. We propose the following 
solution to this problem. 

First, we need a process version of Theorem El That is, given a sequence 
of random variables Yi, Y 2 , . . ., not necessarily independent, taking values in 
a finite alphabet B, one may construct a sequence of simulations (T k , Sk)keN, 
such that the stopping times T k are increasing, and for each k, (T k ,Sk) is a 
simulation of the distribution of (Yi, Yz, . . . , Y k ) from independent unbiased 
bits. Each simulation is efficient, in the sense of Theorem E^ii). The proof is 
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obvious, by successively refining the partition = Q < Qi < . . . < Q b = 1 
of (0, 1) used in the proof of Theorem El 

Since the Markov matrix (3 is irreducible and aperiodic, there exists an 
m > 1 for which all the entries of f3 m are positive. For each block k e Z, 
the k-th simulator will start by generating, using the independent unbiased 
bits Uk at her disposal (and, if necessary, bits from the blocks to her right), 
b separate Markov trajectories of length m, {zk,i,j)i<i<b, i<j<m, such that for 
each 1 < % < b, {zk,i,j)i<j<m is a finite Markov chain that starts with the 
symbol i and has transition probabilities given by the matrix (3. We call 
these finite chains the preamble chains. One of them will later be chosen 
to fill the first m places in the k-th block, but at the moment we don't know 
which. 

Next, we want the k-th simulator to compute symbols to fill the remainder 
of her block. This can be done if in the sequences (zk,ij), coupling was 
achieved, i.e., if all the symbols (zk,i, m )i<i<b are identical; since in this case, 
no matter which of the preamble chains we later choose, we will need the same 
conditional distribution to compute the (m+ l)-th symbol in the block, then 
the (m + 2)-th, and so on. 

Because of our choice of m, we know that coupling is achieved with a 
positive probability. So a positive proportion of the simulators will be able 
to continue and compute symbols to fill their blocks, using the nested sim- 
ulations described above. But each simulator that filled her block, also de- 
termined for the block to her right which of the preamble chains to use - the 
one that corresponds to the last symbol in the block that was filled. For each 
block for which the preamble was determined in this second round of compu- 
tations, one may now proceed to simulate the remainder of the block. This 
then determines the preamble of more blocks, for which the block remainder 
is then computed. By iterating this process, the choice of preamble chains is 
propagated to the right until a single output sequence is determined. 

The computation of the preamble chains uses a fixed amount of entropy 
per block. Since the blocks may be made arbitrarily large in expectation by 
choosing a long enough marker, the loss in entropy can be made negligible. 
The computation of the remainder of each block uses nested simulations, 
which are efficient. Therefore it can be shown that the total entropy loss is 
small, and the simulation will terminate a.s. The output sequence is clearly 
a stationary process, and by the construction its transition probabilities are 
exactly given by the transition matrix (3. Therefore, it is the stationary 
Markov shift M.{(3). The construction is a finitary homomorphism, which can 
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be shown to have exponential tails using large-deviations estimates similarly 
to the Bernoulli case. 



Open Problems 

(i) Do there exist source- universal finitary isomorphisms'? More specifi- 
cally, if the Bernoulli sources B(p) , B(p') , B(q) all have equal entropy, 
does there exist a finitary map (as defined in the introduction) which 
is simultaneously a finitary isomorphism from B(p) to B(q), and from 
B(p') to B(q)7 

Remark. If the function is not required to be a finitary map, the 
answer to the above question is positive, for a trivial reason. The 
function can simply use the law of large numbers to discern whether 
the input is in the almost-sure set of B{p) or of B(p'), and apply one 
of two Keane-Smorodinsky ^Ul finitary isomorphisms accordingly. 

(ii) Construct finitary isomorphisms between general Bernoulli sources with 
explicit bounds on the tails. (See [5], [7j, [T2| for such constructions in 
specific cases). 
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